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original multiple choice items r were then constructed and 
administered. Analysis o± the resulting data indicates that 
true-false test items, item for item r are less discriminating than 
multiple-choice items. This gives partial support tc the belief that 
minute for minute a true-false test can be as reliable as a multiple 
choice test. It also indicates some support to the hypothesis that 
there is no important difference in what the two item forms measure. 
Overall results, despite their limitations r tend to strengthen rather 
than weaken faith in the usefulness and value of true-false test 
items. (LR) 
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1. Reason for the study 

Although many test specialists hold true-falsa test items in low esteem, 
a few see special virtues of efficiency and ease of preparation in them and 
advocate their wider use. One of the arguments advanced in their behalf is 
that multiple choice test items can be converted to true-false items without 
changing what the item measures in any important way, and with possible 
improvement in efficiency (reliability per hour of testing time) . Xhi3 
study was designed to yield data that might support or weaken that argument. 

2. Procedures of the study 

An expertly constructed, highly regarded published teat of natural 
science was chosen as the starting point of the inquiry. The test; consists 
of 90 four-alternative multiple choice iteraa. It is intended for use by 
the general population of high school students in any of tlia high school 
grades . 

Hie investigator converted each q£ the multiple choice items into a 
pair of true false items, one true, the other false, both intended to test 
essentially the same understanding as the original multiple choice Item. 
Exhibit 1 shows an item in the original multiple choice form and in revised 
true-false form. 

Hie resulting 180 true false items ware divided into two 90 item 
forms, A and B, so that one member of each pair would appear in each of 
the two forms. Forms A and B were administered to chance halves of a class 
of 65 students enrolled in an introductory college level course in testing 
and grading. While these were not the kind of students for which the test 
was originally written, their understanding of natural science wai-: not eo 
much better, or so much more uniform, than that of typical high s.hool 
students as to impair the usefulness of their responses for item analysis. 

The most highly discriminating member of each pair was then chosen for 
further study in comparison with the multiple choice forms. Using the 
selected true-false items and the original multiple choice items, two 
additional experimental forms of the test of natural science were constructed. 
In each of the first 44 items were multiple choice Items. The next 44 were 
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true~false items over different concepts « The limitation of the half tests 
to 44 Items a rather than 45 was a concession to co nvenisace in obtaining 
sub'-test scores from a four-column answer sheet* Form C consisted of 44 
of the 45 odd-cumbered mu It ip le choice items from the original test* plus 
44 true-false items derived from the even-numbered multiple choice items of 
the original fceot, Form D was the complement of Form C 5 using eveu-aumbsred 
multiple choice items first, and then true-false items based on the original 
odd numbered items* Forms C and D were administered to chance halves of a 
class of 102 students enrolled in an introductory collage level course in 
testing and grading* 

3* Results of the data analysis 

Table 1 presents the item composition of the two tryout and two final 
forms, statistics of the score distributions „ end measures of the item 
discrimination and test discrimination (reliability) * None of tfessa data 
bear directly on the question being investigated They are presented for 
background information. 

Table 2, however, presents data bearing directly on the point at issue. 

It compares the tnul t ip le~ choice and true- false sections of the two final 
forms. In Form C the mean index of discrimination (Mean D) of the true- 
false items (-30) is only a little less than that for the multiple choice 
items (.33)* In Form D the true® false Items looked much worse (Mean D » ,17) 
than the multiple choice items (Mean D » ,33), The differences in item 
discrimination are reflected in corresponding differences is score reliability 
(ICR. 20), 
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Tab!.® 2. Data on Item Forms 



Test Form 
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.30 


.38 


.17 


K R 20 


.81 


.72 


.86 


.55 


Adjusted K.R. 20 




.84 




.71 



Correlation 

(MC-TF) .92 .55 

Corrected for afcten 1.20 .80 



The adjusted K.R. 20 values were obtained by applying the Spearman 
Brown formula to predict the reliability of an 88 item true false test. 

The rationale for such an adjustment is that students typically answer two 
true- false items, or more, in the time required to answer one multiple 
choice item such as those used in this study. In Form C the adjusted true- 
false reliability (.34) is slightly higher than that of the multiple choice 
Items (.81). However In Form D even the adjustment fails to bring the 
true-false reliability (.71), close to that of the multiple choice items 
(. 86 ). 



The bottom section of Table 2 presents the correlations between scores 
on multiple-choice and true- false items in Form C (.92) and Form D (.55). 
When corrected for attenuation these correlations become 1.20 and .80 
respectively. Note that the mean of the corrected values is 1.00. 



4. Interpretation of the results 

These data confirm the expectation that item fox* item true false test 
items tend to be less discriminating than multiple choice items, though in 
some cases the difference is surprisingly small. They give partial support 
to the belief that minute for minute a true- false test can be as reliable 
as a multiple choice test. They also give some support to the hypothesis 
that there is ho important difference in what the two item forms measure. 
Overall the correlation between sub test composed of the two forms is as high 
as their reliabilities will allow. 
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We have, no good explanation other than sampling fluctuations,, for the 
differences observed between Form C and Form D. Clearly it would have been 
much better to have had a' a of 300 for each of the final forma. It alao 
would have boon better if the -election of items from the tryout forms could 
have been based on more responsos than those provided by 32 or 34 examinees. 
Table 3 shows how much the indices of discrimination for the same item varied 
from tryout to final forms. Values for the same item are circled. Table 3 
also shows the low correlation between indices of discrimination for the 
"fiame" item in true-false and multiple-choice form. Host of ehe&e differences 
are probably attributable to instability (sampling errors) in the indices 
themselves. 

When the study is repeated we should, in addition to using much larger 
n's, use the true-false tryout data less mechanically. With more stable 
indices as a basis from which to work, we should do more revision of the 
true-false items, and seek qualitative as well as quantitative bases for 
the final selection. The multiple choice Items against which the true»faise 
items were being compared were given a much more adequate tryout and much 
more extensive and careful revision. 



Table 3. Discrimination Indices for Related Items Based on the Same Content 
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* 90 True- false items in each form, A and B 

** 44 Multiple-choice items and 44 true false items in each form, C and D. 
Form C Included the odd-numbered multiple-choice items and the oven- 
numbered true-false items. Form D Included the others. 
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Exhibit 1. Sample Items 

A. Original multiple choice form 

l. What enables man to live in a greater range of climates than most 
other animals 

1. He is stronger than other animals 

2. He is a warm-blooded animal 

*3. He can control his surroundings to a greater extent 
4c He eats less than other animals 



B. Alternative true-false forms 

A. Han can live in a greater range of climates than most other animals 
because he is warm-blooded. F 

B. Man is less dependent on his immediate environment for food and comfort 
than are most other animals. T 
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